Developer CD Series 1996 February: Tool Chest

home *** CD-ROM | disk | FTP | other *** search

/ Developer CD Series 1996 February: Tool Chest / Apple Developer CD Series Tool Chest February 1996 (Apple Computer)(1996).iso / Tool Chest / Development Tools & Languages / • Other Platforms / PCCTS 1.31 / Documentation / UPDAT130.txt < prev next >

Wrap

Text File | 1995-03-10 | 21.6 KB | 682 lines | [TEXT/MPS ]

PCCTS 1.30 Release Notes Active Authors for 1.30 Release: Terence Parr Parr Research Corporation 5517 Pleasant Ave Minneapolis, MN 55419 parrt@acm.org Russell W. Quong School of Electrical Engineering Purdue University W. Lafayette, IN 47907 quong@ecn.purdue.edu Other authors: Will Cohen, cohenw@ecn.purdue.edu Hank Dietz, hankd@ecn.purdue.edu November 1, 1994 This document describes the 1.30 release of the Purdue Compiler Construction Tool Set (PCCTS). This file describes the changes made to PCCTS 1.23 to arrive at PCCTS 1.30. A number of major new features were added and a few bug fixes were made. We imagine that this will be the last major feature addition to the ANTLR description language before a 2.00 version can be built. The original 1.00 manual and all release notes are required for a complete documentation set for PCCTS. A book is in the works and the papers provided at the ftp site don't hurt. PCCTS is in the public-domain and can be obtained at everest.ee.umn.edu in pub/pccts/1.30. The newsgroup comp.compilers.tools.pccts provides a discussion forum. To receive future release broadcast messages, register yourself by sending email to pccts@ecn.purdue.edu with a ``Subject:'' line of ``register''. The authors make no claims that this software will do what you want, that this manual is any good, or that the software actually works---use PCCTS at your own risk. Bug reports and/or cheery reports of its usefulness are very welcome, however. The maintenance and support of all PCCTS tools is primarily provided by Parr Research Corporation, Minneapolis MN. Please see file PCCTS.FUTURE for more information. All PCCTS tools currently in the public domain will continue to be in the public domain. I. NEW FEATURES The only new features are element labels and parser exception handling, but they significantly increase the capabilities of ANTLR. The programmer now has complete control over how errors (semantic and syntactic) are handled. Further, the element labels are generally useful and should be considered the preferred method for accessing lexical information about tokens. [Please note that parser exception handling is to be considered in alpha-release state]. A. ELEMENT LABELING Actions in an ANTLR grammar may access attributes via labels of the form ``$label'' attached to token and rule references rather than the conventional ``$i'' for some integer i. By using symbols instead of integer identifiers, grammars are more readable and action parameters are not sensitive to changes in rule element positions. The form of a label is: label:element where element is either a token reference or a rule reference. To refer to the attribute (C interface) or token pointer (C++ interface) of that element in an action, use $label within an action or rule argument list. For example, a : t:ID <<printf("%s\n", $t->getText());>> ; [using the C++ interface]. To reference the tree variable associated with element, use #label When using parser exception handling, simply reference ``label'' to attach a handler to a particular rule reference. For example, a : t:b exception[t] default : <<trap any error found during call to 'b'>> ; Labels must be unique per rule as they have rule scope. Labels may be accessed from parser exception handlers. B. PARSER EXCEPTION HANDLING ***** CONSIDERED ALPHA QUALITY STUFF ***** ANTLR has two mechanisms for error reporting and recovery. In the first mechanism, ANTLR automatically generates error messages using a simple, effective heuristic that is sufficient for many applications. However, when more sophisticated error handling is required, say for commercial-quality software, ANTLR now supports a second mechanism called ``parser exception handling'' that provides the flexibility of hand-built reporting and recovery in a convenient framework. Parser exception handling has much in common with C++ exception handling; we do not actually use C++ exceptions in our implementation and, hence, parser exception handling can be used with either the ANTLR C or C++ interface. Parser exception handling provides a unified framework for reporting and recovering from semantic and syntactic errors; note that automatic mechanisms typically do not consider semantic errors. Parser exception handling provides nearly the flexibility of a hand-built parser [We have not played with the semantic recovery yet, but will soon--you cannot define your own error signals at this point]. We illustrate the use of parser exception handlers by demonstrating how they are used to generate a better error message than the automatic mechanism, which simply reports where the error was detected and what was expected. Consider matching the rule stat using the following grammar fragment, stat: "if" expr "then" stat { "else" stat } | "while" expr "do" stat | VAR ":=" expr ";" | "begin" ( stat )+ "end" ; expr: atom ( "\+" atom )* ; atom: INT | FLOAT ; where INT, FLOAT, and VAR are defined as integer, float and identifier tokens. Upon input if 34+ then i:=1; the automatic error mechanism would report line 1: syntax error at "then" missing { INT FLOAT }. However, because we know the context in which the expr production was attempted, an improved error message could be generated; i.e., it could indicate the expression was both in an if-statement and that it was a conditional--as opposed to the right-hand-side of an assignment statement, for example. A better message would be line 1: if-statement: malformed conditional at "then" One way to achieve this error message is to modify the original stat grammar as follows stat: "if" e:expr "then" stat { "else" stat } exception[e] catch MismatchedToken : catch NoViableAlt : << fprintf(stderr, "line %d: if-statement: malformed conditional at \"%s\"\n", zzline, LATEXT(1)); zzconsumeUntilToken(THEN); >> | "while" expr "do" stat ... ; } where THEN is the token type associated with "then"; MismatchedToken and NoViableAlt are predefined error signals. The notation ``e:expr'' attaches the label e to the expr rule reference. Labels allow the exception handler to catch errors encountered specifically for input matched during that reference. Good error handling requires programmer intervention. Automatic mechanisms typically do not perform well, because they cannot easily analyze the state of the parser (e.g., the symbol stack of a table-driven parser or the program counter of a recursive-descent parser). Knowing where to report errors and how to recover from them must be done with a programmer's experience. While more programming effort is required than for automatic mechanisms, ANTLR's parser exception handling provides a convenient, sophisticated mechanism that rivals the flexibility of hand-coded schemes. ANTLR defaults to using the old, automatic mechanism. Any use of the keyword "exception" triggers the new mechanism. We now provide specific ANTLR programming details for parser exception handling. Syntax A group of exception handlers may be specified after any alternative, after the ";" of a rule definition, and default/global exceptions may be specified before the list of rules. The syntax for an exception group is as follows: exception_group : "exception" { Label_in_brackets } ( exception_handler )* { "default" ":" Action } ; exception_handler : "catch" SIGNAL ":" { Action } ; where SIGNAL is one of: NoViableAlt -- None of the alternatives were predicted NoSemViableAlt -- See below MismatchedToken -- Didn't find the token we were looking for You can also use a "default :" clause, which matches any signal, in your exception group. Currently, you cannot define your own signals. You can define multiple signals for a single action. E.g., exception catch MismatchedToken : catch NoViableAlt : catch NoSemViableAlt : << printf("stat:caught predefined signal\n"); consumeUntil(DIE_set); >> If a label (attached to a rule reference) is specified for an exception group, that group may be specified after the end of the ";" rule terminator. ANTLR can still uniquely identify which rule reference to associate the group with; it often makes a rule cleaner to have most of the exception handlers at the end of the rule. For example, a : A t:expr B | ... ; exception[t] catch ... catch ... The NoViableAlt signal only makes sense for exception groups with labels and for rule exception groups. Exception Handler Order of Execution Given a signal, S, the handler that is invoked is determined by looking through the list of enabled handlers in a specific order. Loosely speaking, we say that a handler is "enabled" (becomes active) and pushed on an exception stack when it has been seen by the parser on its way down the parse tree. A handler is "disabled" and taken off the exception stack when the associated grammar fragment is successfully parsed. The formal rules for enabling are: o All global/default handlers are enabled upon parser entry. o Exception handlers specified after an alternative become enabled when that alternative is predicted. o Exception handlers specified for a rule become enabled when the rule is invoked. o Exception handlers tied to a particular rule reference within an alternative are enabled just before the invocation of that rule reference. Disabling rules are: o All global/default handlers are disabled upon parser exit. o Exception handlers specified after an alternative are disabled when that alternative has been parsed completely (successfully). o Exception handlers specified for a rule become disabled just before the rule returns. o Exception handlers tied to a particular rule reference within an alternative are disabled just after the return from that rule reference. Upon an error condition, the parser with throw an exception signal, S; [in the future, the user will be able to throw user-defined exceptions]. Starting at the top of the stack, each exception group is examined looking for a handler for S. The first S handler found on the stack is executed. In practice, the run time stack and hardware program counter are used to search for the appropriated handler. This amounts to the following: o If there is an exception specified for the enclosing alternative, then look for S in that group first. o If there is no exception for that alternative or that group did not specify an S handler, then look for S in the enclosing rule's exception group. o Global/default handlers are like macros that are inserted into the rule exception group for each rule. o If there is no rule exception or that group did not specify an S handler, then return from the enclosing rule with the current error signal still set to S. o If there is an exception group attached (via label) to the rule that just returned, check that exception group for S. o If an exception group attached to a rule reference does not have an S handler, then look for S in the enclosing rule's exception group. This process continues until an S handler is found or a return instruction is executed in starting rule. These guidelines are best described with an example: a : A c B exception /* 1 */ catch MismatchedToken : <<ACTION1>> | C t:d D exception /* 2 */ catch MismatchedToken : <<ACTION2>> catch NoViableAlt : <<ACTION3>> ; exception[t] /* 3 */ catch NoViableAlt : <<ACTION4>> exception /* 4 */ catch NoViableAlt : <<ACTION5>> c : E ; d : e ; e : F | G ; exception /* 5 */ catch MismatchedToken : <<ACTION6>> The following table summarizes the sequence in which the exception groups are tested: INPUT EXCEPTION GROUP TESTING SEQUENCE ACTION EXECUTED D E B 4 ACTION5 A E D 1 ACTION1 A F B 1 ACTION6 A F B 1 ACTION6 C F B 2 ACTION2 C E D 5,2 ACTION3 The global/default handlers are like macro insertions. For example: exception catch NoViableAlt : <<blah blah>> a : A ; exception catch MismatchedToken : <<ack;>> b : B ; This is functionally equivalent to: a : A ; exception catch MismatchedToken : <<ack;>> catch NoViableAlt : <<blah blah>> b : B ; exception catch NoViableAlt : <<blah blah>> [You will soon be able to throw the same or a different signal from an exception handler. This makes sense when you want a "panic" signal to propagate back up the parse tree so that all rules can clean up.] Modifications to Code Generation The following generic changes were made to facilitate exception handling: o Each rule reference now has a signal parameter which returns 0 if no error occurred during that reference or it returns a nonzero signal S. o The MATCH() macro has been modified to signal Mismatched token rather than calling zzsyn(). o When no viable alternative is found, NoViableAlt is signaled rather than calling the zzsyn() routine. o The parser no longer resynchronizes automatically. See below. Semantic Predicates and NoSemViableAlt When the input stream does not predict any of the alternatives in the current list of possible alternatives, NoViableAlt is signaled. However, what happens when semantic predicates are specified in that alternative list? It turns out that the following simple strategy handles this case: ``If NoViableAlt is signaled and at least one semantic predicate (for a syntactically viable alternative) failed, signal NoSemViableAlt instead of NoViableAlt.'' It would be very misleading to just throw NoViableAlt when in fact one or more alternatives were syntactically viable; i.e., the reason that no alternative was predicted was due to a semantic invalidity--a different signal must be thrown. For example, expr : <<P1>>? ID ... /* function call */ | <<P2>>? ID ... /* array reference */ | INT ; exception catch NoViableAlt : <<no ID or INT was found>> catch NoSemViableAlt : <<an ID was found, but it was not valid>> Typically, the programmer would want to give very different error messages for the two different situations. Specifically, giving a message such as "syntax error at ID missing { ID INT }" would be very misleading (i.e., wrong). In future versions, we may enhance this to throw a different signal depending on which predicate failed. Semantic predicates that are not used to predict alternatives do not yet throw signals. You must continue to use the fail-action for that predicate. Resynchronizing the Parser When an error occurs while parsing rule R, the parser will generally not be able to continue parsing anywhere within that rule--it will return immediately after executing any exception code. The one exception is for exceptions with labels. In this case, the parser knows exactly where in the alternative you would like to continue parsing from. To allow manual resynchronization of the parser, you may use two new functions: zzconsumeUntil(X_set) Consume tokens until you see a token in the token class X. ANTLR now generates a packed bitset called X_set for each token class X. zzconsumeUntilToken(T) Consume tokens until you see token T. For example, #tokclass RESYNCH { A C } a : b A | b C ; b : B exception catch MismatchedToken : // consume until FOLLOW(b) <<print error message; zzconsumeUntil(RESYNCH_set);>> ; Soon, you will be able to reference the FOLLOW(R) and FIRST(R) for a rule R to make resynchronization set construction easier. You may also use zzset_el() to test for membership in a token class. int zzset_el(unsigned b, SetWordType *p) For example, <<if ( zzset_el(LA(1), X_set) ) blah blah blah;>> In C++ mode, simply remove the "zz" prefix from the function names. Example The following grammar demonstrates the use of parser exception handling using the C interface (testcpp/13/test.g provides a C++ example). Given input: if a+ then a=b+b; the output should be: invalid conditional in 'if' statement found assignment to a ---------------------------------------------------------------- #header <<#include "charbuf.h">> << main() { ANTLR(rule(), stdin); } >> #token "[\ \t]+" <<zzskip();>> #token "\n" <<zzskip(); zzline++;>> #token THEN "then" #tokclass DIE { "@" "if" ID "else" } rule: ( stat )+ ; stat: "if" t:expr "then" stat { "else" stat } | u:ID "=" expr ";" <<printf("found assignment to %s\n", $u.text);>> ; exception[t] default : << printf("invalid conditional in 'if' statement\n"); zzconsumeUntilToken(THEN); >> exception catch MismatchedToken : catch NoViableAlt : catch NoSemViableAlt : << printf("stat:caught predefined signal\n"); zzconsumeUntil(DIE_set); >> expr: expr1 ("\+" expr1)* ; expr1 : expr2 ("\*" expr2)* ; expr2: ID ; #token ID "[a-z]+" ---------------------------------------------------------------- C. LEXCLASS SAVE/RESTORE [In the words of Ken Weinert:] This section describes the addition of the mode stack operations defined in Tom Moog's NOTES.newbie document. By defining a variable USER_ZZMODE_STACK at compile time, you can include these functions for use in your grammars. By default, the #define ZZSTACK_MAX_MODE is set to 32 which gives a mode stack of depth 64. If this is too shallow, defining ZZSTACK_MAX_MODE to a different value in the makefile will override the default. Four functions are defined with this modification: zzmpush(), zzmpop(), zzsave_mode_stack(), and zzrestore_mode_stack(). zzmpush(int newMode) push the current mode on the stack and make newMode the current mode. zzmpop() make current the previous mode zzsave_mode_stack(int modeStack[], int *modeLevel) save the current mode stack and depth into indicated variables. Analogous to zzsave_[antlr|dlg]_state(). zzrestore_mode_stack(int modeStack[], int *modeLevel) restore the mode stack and depth from indicated variables. Analogous to zzrestore_[antlr|dlg]_state(). II. CHANGES AND BUG FIXES o Tokens may now be labeled rather than using the $i interface. The two are mutually exclusive in the sense that using $i prevents usage of $label. o Added parser exception handling. o Added #ifdef gate to tokens.h and class definition file for C/C++ mode. o Made the #tokdefs file reader ignore gates on the head of files. o You can no longer pass init-actions to blocks like this (A|B)[action]. o There was a bug in the code generation for (...)* loops for k>1. It used a linear approximate lookahead decision instead of full LL(k). It also used the wrong prediction expression for exiting the loop. o Token class X now results in C/C++ bitset X_set rather than zzerr%d. o ANTLR now allows "@" in a token class. o Added consumeUntil and consumeUntilToken functions for resynching parser in exception handlers. o Added testcpp/13 to test exception handling. o A warning is now generated if you reference a non-token element with $i for some integer i in C++ mode. o There was a bug in the code generation for (...)+ loops. It did not put predicates into "do while"--just at front if-gate. E.g., a : ( b )+ ; b : <<pred>>? A; did not work. o For 16 bit machines, the ANTLRTokenBuffer file was using some implementation-dependent pointer manipulation. This has been changed. o Semantic predicates and -gl option did not work because ANTLR tried to insert line information at the end of a line rather that at the beginning where the C/C++ compiler wants it. o Added "const" to the arguments of the zzerr pointer and zzerrstd(). If you have defined a dlg error routine, you must change the definition accordingly. III. A FEW KNOWN BUGS o I would like to make an auto test script for the C++ samples to check their input/output. o I haven't figured out how to handle deallocation of token objects created. Sometimes the programmer has allocated them in a totally different application and doesn't want ANTLR to track and delete them. o FOLLOW(rule) and FIRST(rule) capabilities need to be added to the tokclass declaration. o No warning is given for nonstandard signals used in parser exception handling. o No warning is given if $label is not defined. o Nothing has been done about the failure of semantic predicates with regards to exception handling. o The demand lookahead in C++ mode is still wacked. o Can't do multiple return values in C++ still. o No warning is given if all alts of a block are guesses: (...)?. IV. ACKNOWLEDGEMENTS Thanks to all you folks who tested 1.30 before its release. Thanks to the folks at the First Annual PCCTS workshop (held at NeXT July 1994) who helped refine the syntax for the parser exception handling. Thanks goes to the folks at NeXT for prodding us about good error recovery.